Cognateness, frequency, and vocabulary size

An interactive account of bilingual lexical acquisition

Gonzalo Garcia-Castro1
Daniela S. Ávila-Varela2
Ignacio Castillejo3
Núria Sebastian-Galles1

Cognateness, frequency, and vocabulary size

An interactive account of bilingual lexical acquisition

Gonzalo Garcia-Castro  

Daniela Ávila-Varela

Ignacio Castillejo

Nuria Sebastian-Galles

Bilingual word acquisition

Learning words is very important

Learning a word involves the association between a linguistic form to its referent(s) (very complex)

Bilinguals face the challenge of learning more than one word-form per referent

We still don’t know much about how bilingualism impacts word learning

Learning outputs: measuring vocabulary size


Vocabulary checklist: number/proportion of words checked by caregivers as Understands, and/or Says


English-Spanish bilinguals have smaller English vocabulary sizes, compared to monolinguals, but similar vocabulary sizes when both language are summer together (Hoff et al. 2012)

Linguistic distance


Bilingual toddlers learning two typologically close languages showed larger vocabulary sizes (Floccia et al. 2018)

Cognate: form-similar translation equivalents

Cognate Non-cognate
[cat] /ˈgat-ˈgato/ [dog] /ˈgos-ˈpe.ro/

Cognateness facilitates vocabulary growth: mechanisms?

Parallel activation: candidate mechanism?

Lexical access is language non-selective:

Translation equivalents are co-activated Even in monolingual situations

Cognates are acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2022)

Dissociation between models of bilingual word processing (parallel activation) and word acquisition

:::

An accumulator model of word acquisition

{width=14in, fig-align:center}

An accumulator model of word acquisition


For participant \(i\) and word \(j\):

\[ \begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \end{aligned} \]

\[ \begin{aligned} \text{Age of acquisition}_{ij} &= min(\text{Threshold}_{ij}-\text{Learning instances}_{ij}) \end{aligned} \]

We fix some parameters:

\[ \begin{aligned} \text{Threshold} &= 250 \\ \lambda &= 1 \end{aligned} \]

Simulating word acquisition

Catalan monolingual (no parallel activation)

Catalan Spanish
100% 0%

\[ \text{Threshold} = 300 \\ \text{Frequency}_{j} \sim \text{Poisson}(\lambda) \\ \lambda = 50 \]

Simulating bilingual word acquisition


\[ \begin{aligned} \text{Learning instances} &= Age_i \cdot Frequency_j \cdot (c \cdot Similarity_j) \end{aligned} \]

Simulating bilingual word acquisition

Catalan-Spanish bilingual (no parallel activation)

Catalan Spanish
60% 40%

\[ \text{Threshold}_{ij} = 300 \\ \text{Frequency}_{j} \sim \text{Poisson}(\lambda) \\ \lambda = 50 \]

Simulating bilingual word acquisition

Catalan-Spanish bilingual (parallel activation)

Catalan Spanish
75% 25%

\[ \text{Threshold} = 250 \\ \text{Freq. per month} \sim \text{Poisson}(1) \]

Methods

Questionnaire


  • On-line, inspired in MacArthur-Bates CDI

  • ~1,600 items/words (800 Catalan + 800 Spanish)

  • Participants filled one of 4 versions of the questionnaire:

  • 500 items: 250 Catalan + 250 Spanish

  • Short-listed (nouns): 302 translation equivalents (TE)

Participants

138,078 responses from 366 participants

1 time 2 times 3 times 4 times
312 42 8 4

Modelling


Ordinal regression model: \(P(Understands)\), \(P(Says)\)

  • No < Understands < Understands and Says

Multilevel: Crossed-random effects

  • Participant and Translation equivalent as grouping variables

Bayesian: probability of parameter values

\[P(\text{model} | \text{data}) \propto P(\text{data} | \text{model}) \times P(\text{model})\]

Modelling

Predictors

  • \(\text{Age}_{i}\) in months
  • \(\text{Length}\): number of phonemes
    • \(\text{cadira}_{\text{cat}}\) /kəˈdi.ɾə/ = 6)
  • \(\text{Exposure}_{ij}\): \(\text{Frequency}_{j} \cdot \text{Language exposure}_{i}\)
    • cadira: 6.19 Zipf freq. per million, 0.5 \(\text{DoE}_{cat}\) => 12.37
  • \(\text{Cognateness}_j\): Levenshtein similarity between the word-form \(j\) and its translation

Results

Regression table

Regression table

etable

fixest’s powerful native tabling functions were designed for LaTeX output. But we can use the markdown = TRUE option to make them work with this theme too. (Details here.) Quick notes:

  • Install the tinytex & pdftools packages first.
  • Set the R chunk option output: asis.
```{r}
#| output: asis

setFixest_etable(markdown = TRUE, drop = "Constant")
setFixest_dict(dict)

etable(mods, highlight = .("se" = "complaints"))
```

Regression table

etable (cont.)

Figures

Figure

Figure

Full-size Figures

You can use the {.background-image} container environment to completely fill the slide background with an image.

Ideally, your figure will be the same aspect ratio as the screen that you’re presenting on.

  • This can be a bit tricky because of the dynamic nature of Reveal.js / HTML. But its probably something close to 16:9.
  • Aspect ratio can also matter for a regular full-frame images (previous slide).

Note: Simple flight data example using threejs. There are many interactive plotting options beyond this. (More details.)

Summary

Summary

A minimal and elegant presentation theme

The Quarto Revealjs clean theme is intended as a minimal and elegant presention theme.

We have highlighted some theme-specific components. But all of the regular Revealjs functionality is supported (chalkboard, etc.)

Install the theme:

quarto install extension grantmcdermott/quarto-revealjs-clean

Use these demo slides as a template:

quarto use template grantmcdermott/quarto-revealjs-clean-demo

Appendix

Item properties

Levenshtein similarity

Phonological similarity

Levenshtein distance: number of edits for two character strings to become identical

Orthography Phonology String
Catalan porta /ˈpɔɾ.tə/ pɔɾtə
Spanish puerta /ˈpweɾ.ta/ pweɾta

Levenshtein similarity

\[ 1-\frac{lev(A, B)}{Max(length(A), length(B))} \]

Catalan Spanish Levenshtein
porta (/ˈpɔɾ.tə/) puerta (/ˈpweɾ.ta/) 0.50 (3)
taula (/ˈtaw.lə/) mesa (/ˈmesa/) 0.00 (5)
cotxe (/ˈkɔ.t͡ʃə/) coche (/ˈkot͡ʃe/) 0.40 (3)

References

Floccia, Caroline, Thomas D. Sambrook, Claire Delle Luche, Rosa Kwok, Jeremy Goslin, Laurence White, Allegra Cattani, et al. 2018. “I: Introduction.” Monographs of the Society for Research in Child Development 83 (1): 7–29. https://doi.org/10.1111/mono.12348.
Hoff, Erika, Cynthia Core, Silvia Place, Rosario Rumiche, Melissa Señor, and Marisol Parra. 2012. “Dual Language Exposure and Early Bilingual Development*.” Journal of Child Language 39 (1): 1–27. https://doi.org/10.1017/S0305000910000759.
Mitchell, Lori, Rachel K. Y. Tsui, and Krista Byers-Heinlein. 2022. “Cognates Are Advantaged in Early Bilingual Expressive Vocabulary Development.” PsyArXiv. https://doi.org/10.31234/osf.io/daktp.

Back to main